Why neural networks should not be used for HIV-1 protease cleavage site prediction
نویسندگان
چکیده
UNLABELLED Several papers have been published where nonlinear machine learning algorithms, e.g. artificial neural networks, support vector machines and decision trees, have been used to model the specificity of the HIV-1 protease and extract specificity rules. We show that the dataset used in these studies is linearly separable and that it is a misuse of nonlinear classifiers to apply them to this problem. The best solution on this dataset is achieved using a linear classifier like the simple perceptron or the linear support vector machine, and it is straightforward to extract rules from these linear models. We identify key residues in peptides that are efficiently cleaved by the HIV-1 protease and list the most prominent rules, relating them to experimental results for the HIV-1 protease. MOTIVATION Understanding HIV-1 protease specificity is important when designing HIV inhibitors and several different machine learning algorithms have been applied to the problem. However, little progress has been made in understanding the specificity because nonlinear and overly complex models have been used. RESULTS We show that the problem is much easier than what has previously been reported and that linear classifiers like the simple perceptron or linear support vector machines are at least as good predictors as nonlinear algorithms. We also show how sets of specificity rules can be generated from the resulting linear classifiers. AVAILABILITY The datasets used are available at http://www.hh.se/staff/bioinf/
منابع مشابه
Feature Selection Combined with Neural Network Structure Optimization for HIV-1 Protease Cleavage Site Prediction
It is crucial to understand the specificity of HIV-1 protease for designing HIV-1 protease inhibitors. In this paper, a new feature selection method combined with neural network structure optimization is proposed to analyze the specificity of HIV-1 protease and find the important positions in an octapeptide that determined its cleavability. Two kinds of newly proposed features based on Amino Ac...
متن کاملReduced Bio-basis Function Neural Networks for Protease Cleavage Site Prediction
This paper presents a new neural learning algorithm for protease cleavage site prediction. The basic idea is to replace the radial basis function used in radial basis function neural networks by a so-called bio-basis function using amino acid similarity matrices. Mutual information is used to select bio-bases and a corresponding selection algorithm is developed. The algorithm has been applied t...
متن کاملCharacterizing proteolytic cleavage site activity using bio-basis function neural networks
MOTIVATION In protein chemistry, proteomics and biopharmaceutical development, there is a desire to know not only where a protein is cleaved by a protease, but also the susceptibility of its cleavage sites. The current tools for proteolytic cleavage prediction have often relied purely on regular expressions, or involve models that do not represent biological data well. RESULTS A novel methodo...
متن کاملMining association rules for HIV-1 protease cleavage site prediction
Several machine learning techniques, like neural networks, nonlinear support vector machines and decision trees, have been used to model the specificity of HIV-1 protease and to extract specific patterns from peptides cleaved by this protease. Despite many studies, no perfect rules are already known to determine the cleavage of a peptide by HIV-1 protease. These rules are useful for designing s...
متن کاملComprehensive bioinformatic analysis of the specificity of human immunodeficiency virus type 1 protease.
Rapidly developing viral resistance to licensed human immunodeficiency virus type 1 (HIV-1) protease inhibitors is an increasing problem in the treatment of HIV-infected individuals and AIDS patients. A rational design of more effective protease inhibitors and discovery of potential biological substrates for the HIV-1 protease require accurate models for protease cleavage specificity. In this s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 20 11 شماره
صفحات -
تاریخ انتشار 2004